Everything Totally Explained


Ask & we'll explain, totally!
Mapping of Unicode character planes
Totally Explained


  NEW! All the latest news in the worlds of computer gaming, entertainment, the environment,  
finance, health, politics, science, stocks & shares, technology and much, much, more.  


View this entry using RSS

Everything about Supplementary Multilingual Plane totally explained

The Unicode characters can be categorized in many different ways, Unicode code points can be logically divided into 17 planes, each with 65,536 (= 216) code points, although currently only a few planes are used:
  • Plane 0 (0000–FFFF): Basic Multilingual Plane (BMP). This is the plane containing most of the character assignments so far. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing systems in current use.
  • Plane 1 (10000–1FFFF): Supplementary Multilingual Plane (SMP).
  • Plane 2 (20000–2FFFF): Supplementary Ideographic Plane (SIP)
  • Planes 3 to 13 (30000–DFFFF) are unassigned
  • Plane 14 (E0000–EFFFF): Supplementary Special-purpose Plane (SSP)
  • Plane 15 (F0000–FFFFF) reserved for the Private Use Area (PUA)
  • Plane 16 (100000–10FFFF), reserved for the Private Use Area (PUA)
Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively blocked out for every current and ancient writing system (script) the Unicode consortium has been able to identify: (see (External Link)). While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain, if previously unknown scripts with tens of thousands of characters are discovered. This 20 bit limit is therefore unlikely to be reached in the near future.

Basic Multilingual Plane

The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.
As of Unicode 5.0, The BMP includes the following scripts:
  • Basic Latin (0000–007F)
  • Latin-1 Supplement (0080–00FF)
  • Latin Extended-A (0100–017F)
  • Latin Extended-B (0180–024F)
  • IPA Extensions (0250–02AF)
  • Spacing Modifier Letters (02B0–02FF)
  • Combining Diacritical Marks (0300–036F)
  • Greek and Coptic (0370–03FF)
  • Cyrillic (0400–04FF)
  • Cyrillic Supplement (0500–052F)
  • Armenian (0530–058F)
  • Hebrew (0590–05FF)
  • Arabic (0600–06FF)
  • Syriac (0700–074F)
  • Arabic Supplement (0750–077F)
  • Thaana (0780–07BF)
  • N'Ko (Mandenkan) (07C0–07FF)
  • Indic scripts:
  • Thai (0E00–0E7F)
  • Lao (0E80–0EFF)
  • Tibetan (0F00–0FFF)
  • Burmese (1000–109F)
  • Georgian (10A0–10FF)
  • Hangul Jamo (1100–11FF)
  • Ethiopic (1200–137F)
  • Ethiopic Supplement (1380–139F)
  • Cherokee (13A0–13FF)
  • Unified Canadian Aboriginal Syllabics (1400–167F)
  • Ogham (1680–169F)
  • Runic (16A0–16FF)
  • Philippine scripts:
  • Khmer (1780–17FF)
  • Mongolian (1800–18AF)
  • Limbu (1900–194F)
  • Tai Le (1950–197F)
  • New Tai Lue (1980–19DF)
  • Khmer Symbols (19E0–19FF)
  • Buginese (1A00–1A1F)
  • Balinese (1B00–1B7F)
  • Lepcha (Rong) (1C00–1C4F)
  • Phonetic Extensions (1D00–1D7F)
  • Phonetic Extensions Supplement (1D80–1DBF)
  • Combining Diacritical Marks Supplement (1DC0–1DFF)
  • Latin Extended Additional (1E00–1EFF)
  • Greek Extended (1F00–1FFF)
  • Symbols:
  • Glagolitic (2C00–2C5F)
  • Latin Extended-C (2C60–2C7F)
  • Coptic (2C80–2CFF)
  • Georgian Supplement (2D00–2D2F)
  • Tifinagh (2D30–2D7F)
  • Ethiopic Extended (2D80–2DDF)
  • Supplemental Punctuation (2E00–2E7F)
  • CJK Radicals Supplement (2E80–2EFF)
  • Kangxi Radicals (2F00–2FDF)
  • Ideographic Description Characters (2FF0–2FFF)
  • CJK Symbols and Punctuation (3000–303F)
  • Hiragana (3040–309F)
  • Katakana (30A0–30FF)
  • Bopomofo (3100–312F)
  • Hangul Compatibility Jamo (3130–318F)
  • Kanbun (3190–319F)
  • Bopomofo Extended (31A0–31BF)
  • CJK Strokes (31C0–31EF)
  • Katakana Phonetic Extensions (31F0–31FF)
  • Enclosed CJK Letters and Months (3200–32FF)
  • CJK Compatibility (3300–33FF)
  • CJK Unified Ideographs Extension A (3400–4DBF)
  • Yijing Hexagram Symbols (4DC0–4DFF)
  • CJK Unified Ideographs (4E00–9FFF)
  • Yi Syllables (A000–A48F)
  • Yi Radicals (A490–A4CF)
  • Modifier Tone Letters (A700–A71F)
  • Latin Extended-D (A720–A7FF)
  • Syloti Nagri (A800–A82F)
  • Phags-pa (A840–A87F)
  • Hangul Syllables (AC00–D7AF)
  • High Surrogates (D800–DB7F)
  • High Private Use Surrogates (DB80–DBFF)
  • Low Surrogates (DC00–DFFF)
  • Private Use Area (E000–F8FF)
  • CJK Compatibility Ideographs (F900–FAFF)
  • Alphabetic Presentation Forms (FB00–FB4F)
  • Arabic Presentation Forms-A (FB50–FDFF)
  • Variation Selectors (FE00–FE0F)
  • Vertical Forms (FE10–FE1F)
  • Combining Half Marks (FE20–FE2F)
  • CJK Compatibility Forms (FE30–FE4F)
  • Small Form Variants (FE50–FE6F)
  • Arabic Presentation Forms-B (FE70–FEFF)
  • Halfwidth and Fullwidth Forms (FF00–FFEF)
  • Specials (FFF0–FFFF)
  • Future additions Several scripts are expected to be included in the BMP in the next revision of Unicode. These scripts, and their proposed code point ranges, are the following:
  • Cham (18B0–18FF)
  • Lanna (Old Tai Lue) (1A80–1AEF)
  • Santali (Ol Cemet' / Ol Chiki) (2DE0–2DFF)
  • Vai (A500–A61F)
  • Saurashtra (AB00–AB5F) Several other scripts are proposed for inclusion in the BMP, including:
  • Avestan (0800–083F)
  • Pahlavi (0840–087F)
  • Batak (1A20–1A5F)
  • Meitei Mayek / Meitei (1C80–1CDF)
  • Varang Kshiti (AA00–AA3F)
  • Sorang Sompeng (AA40–AA6F)

    Supplementary Multilingual Plane

    Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.
    As of Unicode 5.0, Plane One includes the following scripts:
  • Linear B Syllabary (10000–1007F)
  • Linear B Ideograms (10080–100FF)
  • Aegean Numbers (10100–1013F)
  • Ancient Greek Numbers (10140–1018F)
  • Old Italic (10300–1032F)
  • Gothic (10330–1034F)
  • Ugaritic (10380–1039F)
  • Old Persian (103A0–103DF)
  • Deseret (10400–1044F)
  • Shavian (10450–1047F)
  • Osmanya (10480–104AF)
  • Cypriot Syllabary (10800–1083F)
  • Phoenician (10900–1091F)
  • Kharoshthi (10A00–10A5F)
  • Sumero-Akkadian Cuneiform (12000–1236E and 12400–12473)
  • Byzantine Musical Symbols (1D000–1D0FF)
  • Musical Symbols (1D100–1D1FF)
  • Ancient Greek Musical Notation (1D200–1D24F)
  • Tai Xuan Jing Symbols (1D300–1D35F)
  • Mathematical Alphanumeric Symbols (1D400–1D7FF)
  • Many other scripts are proposed for inclusion in Plane One, including:
  • Old Permic
  • Meroitic
  • Manichaean
  • Balti
  • Aramaic
  • South Arabian
  • Brahmi
  • Soyombo
  • Indus script
  • Tengwar
  • Cirth
  • Blissymbols
  • Basic Egyptian Hieroglyphics
  • Rod Numerals
  • Supplementary Ideographic Plane

    Plane 2, the Supplementary Ideographic Plane (SIP), is used for about 40,000 Unified Han Ideographs that have previously been seldom used in daily written communications.

    Unused planes

    Unicode hasn't yet assigned any characters to Planes 3 through 13. It isn't anticipated that these planes will be needed, given the total sizes of the known writing systems left to be encoded. However, the number of possible symbol characters that could arise outside of the context of writing systems is potentially limitless. The UCS and Unicode take requests for symbols on a case by case basis.
       Plane 3, tentatively named Tertiary Ideographic Plane (TIP), is currently planned to be used for Old Hanzi and Oracle Bone characters. (External Link)

    Supplementary Special-purpose Plane

    Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters in two blocks of 128 and 240 characters. The first block is for language tag characters for use when language can't be indicated through other protocols (such as the Private use planes Two planes (planes 15 and 16) have been set aside for character assignment by parties outside the ISO and the Unicode Consortium. Use of such characters will have limited interoperability. Software and fonts that support Unicode won't necessarily support characters assignments by other parties. Especially if the characters have unusual properties such as right-to-left characters, other implementations may treat those characters inappropriately.

    Plane mapping tables

    Unicode mapping tables
    BMP SMP SIP SSP
    width="16%"
     
       
       
       
     

    Further Information

    Get more info on 'Supplementary Multilingual Plane'.


    External Link Exchanges

    Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:

      <a href="http://mapping_of_unicode_character_planes.totallyexplained.com">Mapping of Unicode character planes Totally Explained</a>

    Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
       As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned.



  • Copyright © 2007-8 totallyexplained.com | Licensed under the GNU Free Documentation License | Site Map
    This article contains text from the Wikipedia article Mapping of Unicode character planes (History) and is released under the GFDL | RSS Version